Word Frequency


Method

This Plot looks only at the most frequent words use over the entire corpus of tweets

Word Frequency Per Media House


Method

This Plot looks only at the most frequent words use over the entire corpus of tweets and grouped by Twitter handle

Description

From the word frequency one can see that the Government focuses on supportive words. Some of the news agencies use words related to themselves the most, however, there are still some words like “wc”, “level3lockdown” and “test” that might give us more insight into what topics are discussed. In the next section we will group the words into bi-grams to help build more context for what topics could be present.

Bi-Gram Frequency


Method

The words are grouped by their adjacent words forming two word groups and then these groups gets counted.

Description

Looking at the word frequency one can be ascertain that News24 frequently reported on the testing backlog, eNCA reported about the lockdown and briefings and EWN reported on foreign nationals, while the Government posted more on supportive measures.

Optimal K For Topic Model


Method

Through the use of the “ldatuning” package, it realizes 4 metrics: “Griffiths2004”, “CaoJuan2009”, “Arun2010”, “Deveaud2014” to select the perfect number of topics for a LDA model. The total number of CPU cores can be indicated for optimal performance when executing this method. The larger the dataset, the longer it takes to calculate the results. For more information on this method and the various metrics to obtain the optimal K topics, visit: https://cran.r-project.org/web/packages/ldatuning/vignettes/topics.html or https://eight2late.wordpress.com/2015/09/29/a-gentle-introduction-to-topic-modeling-using-r/

Description

While looking at the results of this plot, there can be seen that metrics “Griffiths2004”, “Arun2010”, “Deveaud2014” are not informative for this specific LDA dataset. To find the optimal “K” amount of topics, one needs to look for an “elbow” (a situation where the plot changes abruptly). Thus the optimal amount of topics within the tweet dataset according to the “CaoJuan2009” metric lies between 4 and 10.

References

  1. Rajkumar Arun, V. Suresh, C. E. Veni Madhavan, and M. N. Narasimha Murthy. 2010. On finding the natural number of topics with latent dirichlet allocation: Some observations. In Advances in knowledge discovery and data mining, Mohammed J. Zaki, Jeffrey Xu Yu, Balaraman Ravindran and Vikram Pudi (eds.). Springer Berlin Heidelberg, 391–402. http://doi.org/10.1007/978-3-642-13657-3_43

  2. Cao Juan, Xia Tian, Li Jintao, Zhang Yongdong, and Tang Sheng. 2009. A density-based method for adaptive lda model selection. Neurocomputing — 16th European Symposium on Artificial Neural Networks 2008 72, 7–9: 1775–1781. http://doi.org/10.1016/j.neucom.2008.06.011

  3. Romain Deveaud, Éric SanJuan, and Patrice Bellot. 2014. Accurate and effective latent concept modeling for ad hoc information retrieval. Document numérique 17, 1: 61–84. http://doi.org/10.3166/dn.17.1.61-84

  4. Thomas L. Griffiths and Mark Steyvers. 2004. Finding scientific topics. Proceedings of the National Academy of Sciences 101, suppl 1: 5228–5235. http://doi.org/10.1073/pnas.0307752101

  5. Martin Ponweiser. 2012. Latent dirichlet allocation in r. Retrieved from http://epub.wu.ac.at/id/eprint/3558

Topic Model For All Tweets


Method

This topic model was built using LDA (“Latent Dirichlet Allocation”) with a K parameter of 8 based on the method used for identifying the optimal K value. This model was used with a “beta” matrix in order to examine per-topic-per-word probabilities.

Description

Looking at the topic model for all the tweets one can identify the following eight tweets during the time period 19 May to 18 June 2020:

  • Statistics on Covid 19
  • The Ministers briefings
  • Lifted Restriction on flights
  • Hospitals
  • The Presidents and Level 3 lockdown
  • Schools and dducation in Western Cape
  • Lifted restriction on Alcohol sales
  • Court case surrounding regulations

Topic Model for Media 24


Method

This topic model was built using LDA (“Latent Dirichlet Allocation”) with a K parameter of 4. This model uses a “beta” matrix in order to examine per-topic-per-word probabilities.

Description

Looking at Media24’s Topic Model one can identify the following 4 topic areas during the period of 19 May to 18 June 2020:

  • Reports on the Western Cape
  • The Situation surrounding schools
  • The President’s live speeches
  • Reports on Covid-19 statistics

Topic Model For EWNupdates


Method

This topic model was built using LDA (“Latent Dirichlet Allocation”) with a K parameter of 4. This model uses a “beta” matrix in order to examine per-topic-per-word probabilities.

Description

Looking at EWNupdates’s Topic Model one can identify the following 4 topic areas during the period of 19 May to 18 June 2020:

  • Information surrounding deaths
  • Minister’s Live briefings
  • Reports on Western Cape’s situation
  • Promotion of one of their tv-shows

Topic Model for eNCA


Method

This topic model was built using LDA (“Latent Dirichlet Allocation”) with a K parameter of 4. This model uses a “beta” matrix in order to examine per-topic-per-word probabilities..

Description

Looking at ENCA Topic Model one can identify the following 4 topic areas during the period of 19 May to 18 June 2020:

  • Ministers and President’s Live briefings
  • Lockdown level 3 regulations
  • Reports on Western Cape’s schools
  • Court proceedings about lockdown regulations

Topic Model For SABC News


Method

This topic model was built using LDA (“Latent Dirichlet Allocation”) with a K parameter of 6. SABC News had the most overall Tweets and a topic model with the only to have 6 topics as the other media houses yielded mixed results. This model uses a “beta” matrix in order to examine per-topic-per-word probabilities.

Description

Looking at SABC News Topic Model one can identify the following 6 topic areas during the period of 19 May to 18 June 2020:

  • Statistics on the Nation’s situation
  • Statistics on Western Cape’s situation
  • Court proceedings about lockdown regulations
  • Reports on schools
  • World health and South Africans return to work
  • Live briefings on regulations

Topic Model for GovernmenrZA


Method

This topic model was built using LDA (“Latent Dirichlet Allocation”) with a K parameter of 4. This model uses a “beta” matrix in order to examine per-topic-per-word probabilities.

Description

Looking at GovernmentZA’s Topic Model one can identify the following 3 topic areas during the period of 19 May to 18 June 2020:

  • Support Services
  • Ministers and President’s Live briefings
  • Spread of virus due to gatherings

Topic number 3 is still somewhat ambiguous